Parameter-efficient fine-tuning of large-scale pre-trained language models
نویسندگان
چکیده
Abstract With the prevalence of pre-trained language models (PLMs) and pre-training–fine-tuning paradigm, it has been continuously shown that larger tend to yield better performance. However, as PLMs scale up, fine-tuning storing all parameters is prohibitively costly eventually becomes practically infeasible. This necessitates a new branch research focusing on parameter-efficient adaptation PLMs, which optimizes small portion model while keeping rest fixed, drastically cutting down computation storage costs. In general, demonstrates large-scale could be effectively stimulated by optimization few parameters. Despite various designs, here we discuss analyse approaches under more consistent accessible term ‘delta-tuning’, where ‘delta’ mathematical notation often used denote changes, borrowed refer are ‘changed’ during training. We formally describe problem propose unified categorization criterion for existing delta-tuning methods explore their correlations differences. also theoretical principles underlying effectiveness interpret them from perspectives optimal control. Furthermore, provide holistic empirical study over 100 natural processing tasks investigate aspects delta-tuning. comprehensive analysis, our practical properties in PLMs.
منابع مشابه
Efficient and Robust Parameter Tuning for Heuristic Algorithms
The main advantage of heuristic or metaheuristic algorithms compared to exact optimization methods is their ability in handling large-scale instances within a reasonable time, albeit at the expense of losing a guarantee for achieving the optimal solution. Therefore, metaheuristic techniques are appropriate choices for solving NP-hard problems to near optimality. Since the parameters of heuristi...
متن کاملLarge scale structure and supersymmetric inflation without fine tuning.
We explore constraints on the spectral index n of density fluctuations and the neutrino energy density fraction ΩHDM , employing data from a variety of large scale observations. The best fits occur for n ≈ 1 and ΩHDM ≈ 0.15 − 0.30, over a range of Hubble constants 40 − 60 km s−1 Mpc−1. We present a new class of inflationary models based on realistic supersymmetric grand unified theories which d...
متن کاملCOMPUTATIONALLY EFFICIENT OPTIMUM DESIGN OF LARGE SCALE STEEL FRAMES
Computational cost of metaheuristic based optimum design algorithms grows excessively with structure size. This results in computational inefficiency of modern metaheuristic algorithms in tackling optimum design problems of large scale structural systems. This paper attempts to provide a computationally efficient optimization tool for optimum design of large scale steel frame structures to AISC...
متن کاملLarge Scale Hierarchical Neural Network Language Models
Feed-forward neural network language models (NNLMs) are known to improve both perplexity and word error rate performance for speech recognition compared with conventional ngram language models. We present experimental results showing how much the WER can be improved by increasing the scale of the NNLM, in terms of model size and training data. However, training time can become very long. We imp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nature Machine Intelligence
سال: 2023
ISSN: ['2522-5839']
DOI: https://doi.org/10.1038/s42256-023-00626-4